Towards Arabic Spell-Checker Based on N-Grams Scores
نویسندگان
چکیده
منابع مشابه
Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset
Spell-checking is the process of detecting and sometimes providing suggestions for incorrectly spelled words in a text. Basically, the larger the dictionary of a spell-checker is, the higher is the error detection rate; otherwise, misspellings would pass undetected. Unfortunately, traditional dictionaries suffer from out-of-vocabulary and data sparseness problems as they do not encompass large ...
متن کاملImproving Finite-State Spell-Checker Suggestions with Part of Speech N-Grams
We demonstrate a finite-state implementation of context-aware spell checking utilizing an N-gram based part of speech (POS) tagger to rerank the suggestions from a simple edit-distance based spell-checker. We demonstrate the benefits of context-aware spellchecking for English and Finnish and introduce modifications that are necessary to make traditional N-gram models work for morphologically mo...
متن کاملCorpus-Based Arabic Stemming Using N-Grams
In languages with high word inflation such as Arabic, stemming improves text retrieval performance by reducing words variants. We propose a change in the corpus-based stemming approach proposed by Xu and Croft for English and Spanish languages in order to stem Arabic words. We generate the conflation classes by clustering 3-gram representations of the words found in only 10% of the data in the ...
متن کاملTowards a Phonetic Brazilian Portuguese Spell Checker
Spell checking is no longer considered a big challenge for natural language processing, at least regarding the task of correcting documents during edition. Nevertheless, without human interaction, it is necessary to automatically choose the word that will more likely correct the misspelled word. Also, there is a further difficulty for spell checking: new types of errors on the web material have...
متن کاملAuthor Profiling for Arabic Tweets based on n-grams
This paper presents an approach for author profiling of an unknown users from their texts produced in social media. In particular, we address the identification of two profile dimensions: gender and language variety, of Arabic twitter users based on their tweets. Our approach focused on applying metaclassification technique on features extracted from tweets body. We explored two main sets of fe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2012
ISSN: 0975-8887
DOI: 10.5120/8400-2168